Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture
نویسندگان
چکیده
This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for different numbers of SIMD co-processors. The experimental results show that the ePUMA architecture's memory subsystem can effectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension.
منابع مشابه
On The Design of High Perfprmance Reconfigurable DSP processor using FPGA
In this paper, a high performance reconfigurable combined architecture of Discrete Wavelet Transform (DWT), Matrix Multiplication and Fast Fourier Transform is presented. This reduces area and become cost-effective. In the proposed DWT architecture the input data are separated as even and odd numbers of data as well as both data are inputted parallel. This cause faster DWT operation then conven...
متن کاملA New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملImplementation of Sparse Matrix Arithmetic on a DSP Processor
The paper presents a method for sparse matrix multiplication on a DSP processor. Its high efficiency is a consequence of the proposed pseudo-random data memory access and parallelism of the multifunctional instructions of a DSP. Sparse matrix multiplication is implemented as linear expanded DSP code automatically generated by specially designed program. The method is applied to predictive vecto...
متن کاملBREAKING NEW GROUNDS OVER 3000 M MAC/s: A BROADBAND MOBILE MULTIMEDIA MODEM DSP
Future DSP architectures need to be developed for paving the way into a generation of high scale signal processing power, i.e. greater than 1000M MAC/s (multiply accumulate per second). This can only be reached today by introducing a new parallel processing paradigm into DSP architecture. Here we show that this requirement can be achieved already today at moderate clock speeds (100MHz) based on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009